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Abstract — In this article, we analyze the SPICE method de- 
veloped in 1 1 1, and establish its connections with other standard 
sparse estimation methods such as the Lasso and the LAD-Lasso. 
This result positions SPICE as a computationally efficient tech- 
nique for the calculation of Lasso-type estimators. Conversely, 
this connection is very useful for establishing the asymptotic 
properties of SPICE under several problem scenarios and for 
suggesting suitable modifications in cases where the naive version 
of SPICE would not work. 



I. Introduction 

SPECTRAL line estimation, or the problem of estimat- 
ing the amplitudes and frequencies of a signal com- 
posed of a sum of sinusoids contaminated by Gaussian 
white noise, is a ubiquitous and well studied area in the 
field of signal processing [2|. Many classes of methods 
have been devised to solve this problem under several dif- 
ferent scenarios like, e.g., uniformly/non-uniformly spaced 
samples, a priori known/unknown number of sinusoids, ho- 
moscedastic/heteroscedastic (constant/varying variance) sam- 
ples, parametric/non-parametric model-based, and so on (2] [3] 
4|. 

Recently, SPICE (SemiParametric/SParse Iterative 
Covariance-based Estimator), a new technique for spectral 
line estimation inspired by ideas from sparse estimation, has 
been proposed in |Q~). This method is capable of handling 
irregularly sampled data. Similarly, a version of SPICE 
has also been developed for array signal processing (5), a 
mathematically almost equivalent problem [2 Chapter 6], 

In this paper, we establish the connection between SPICE 
and standard sparse estimation methods such as the Lasso [6] 
and the LAD-Lasso Q- This connection, based on the so- 
called Elfving theorem from optimal experiment design [8], 
puts the SPICE method into perspective, allowing us to 
examine the asymptotic properties of SPICE under several 
scenarios by simply applying the existing theory for the Lasso 
and its variants (see, e.g., the recent book [9]). Conversely, 
the relationship between SPICE and Lasso-type estimators 
suggests that SPICE may be used as a (new) numerically 
efficient technique for computing Lasso estimates. 

The manuscript is organized as follows. S ection |H| describes 
the spectral line estimation problem and the SPICE method. 
Section establishes the relation between SPICE and Lasso- 



type sparse estimation methods. In Section IV a simulation 
example illustrating the equivalence between SPICE and a 
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version of Lasso is presented. Finally, Section [V] concludes 
the paper. 

Notation: Vectors and matrices are written in bold lowercase 
and uppercase fonts, respectively. T and H denote transposition 
and complex conjugate transposition, respectively. Re z and 
Imz stand for the real and imaginary parts of the complex 
number z, and j is the square root of —1. Kq is the set of non- 
negative real numbers, and C is the complex plane. || • || i, || • H2, 
|| • and | - j correspond to the 1-norm, Euclidean norm, Frobe- 
nius norm and absolute value, respectively, diag (a 1; . . . , a„) 
is a diagonal matrix whose diagonal is given by a\, . . . , a n . I 
is the identity matrix. E{-} denotes mathematical expectation. 

II. Problem Formulation and SPICE method 

Consider the following problem: Let y G C Wxl be given, 
satisfying the equation 

K 

V = y^QfcSfc + e, (1) 
fc=i 

where e e C Wxl is a complex Gaussian random vector 
of zero mean and covariance matrix diag (01, . . . , ctjv), and 



{a k } 



K 

k=l 



are known complex vectors. {sk\k=i £ 



are unknown complex quantities, of the form s k 



k=l 

\sk\e 



3<t>k 

where the phases {<fik}%=i £ [0, 27r) are independent random 
variables uniformly distributed in [0, 2tt), and the magnitudes 
{ I s I } a~ 1 £ ^0" axe deterministic parameters to be estimated. 
The spectral line estimation problem considers a particular 
case of ([TJ), where the a^'s are vectors of imaginary exponen- 
tials of the form e? ut 0. 

In order to estimate the magnitudes |sjt|, let 

E{yy H } = A H PA, 



R 



where 



ai ■■■ a K I] 
='■ [a-i ■ ■ ■ o-k+n] 
P:=diag(|si| 2 ,...,|s K | 2 ,CTi, 
=: diagOi, . . . ,Pk+n)- 



, on) 



The SPICE estimate HI of the |sfc|'s is an iterative procedure 
of the form: 

R(i) = A H dia,g {px(i), . . . ,p K+N (i))A 



Pk(i + l)=Pk(i) 1 h 1/2 ) m 
PW 



WvWi 



P(i) 



K+N 
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Pl {i)\a^R- l {{)y\, 



where i is the iteration number, and pk (i) is the estimate of pk 
at iteration i. This method is initialized by any initial estimate 
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of the pk's, and its estimate R(i) converges to the matrix R 
minimizing 



f(R) ^WR-^iw 11 -R)\\ 



(3) 



The ps-'s that give R correspond to the limits lim^oo Pk{i)- 
Remark 1: The presence of the inverse of R(i) in the 
SPICE method may in principle lead to complications if such 
a matrix becomes singular. However, if the Pfc(Q)'s are chosen 
to be strictly positive, then R(i + l) is generically non-singular 
(since is generically in the column range of R(i), and y 
is a Gaussian random vector which lies in the null space of 
R(i) with probability 0). Because of this, here and in the 
sequel we will implicitly assume for the derivations that R is 
non-singular. 

Remark 2: In 0, SPICE was defined based on a slightly 
different f(R). We will not consider that version of SPICE, 
because such a version can only be defined in a multi-snapshot 
case. However, similar steps as the ones described in the 
following sections can be applied to the method in to arrive 
at an equivalent Lasso-type formulation. 

III. Analysis of SPICE 

The first version of SPICE in [ 1 1 allows the variances ak 
to be different, while a variant of the method imposes the 
constraint that <j\ = ■ ■ ■ <jn = : °" U Section HI.D]. We will 
treat these cases separately, starting with the case where the 
variances can be different. 

A. Different variances 

As shown in |Q], the function / in ([3]) can be written as 

f(R) = HilR-^iyy" - R)] H R- x '\yy H - R)} 
= \\y\\ly H R- l y-2\\y\\l+ttR, 

hence minimizing f(R) is equivalent to minimizing 



g(R) :=y M R- i y 



1 



Ivlli 



trR 



(4) 
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subject to pk > 0, where 



Kill 
Ill/Ill ' 



To further simplify the problem, in [5 Appendix B] it is argued 
that the minimization of g(R) is equivalent to solving 

y H R~ 1 y 



mm 

Pl,—,PK+N>0 

s.t. 



K+N 

E W kPk = 1 
fc=l 
K+N 

E a k a^p k = R. 

k=l 



(5) 



Equation |5]l will be our starting point for the analysis of 
SPICE. A slight simplification can be achieved by defining 



p k ■■= w k p k and a k 



-1/2 



a k for all k = 1,...,K + N. 



This gives the re-parameterized problem 

yH R -U 



mm 

.,Pk+n>0 
S.t. 



y 



K+N 

E Pk = i 

k=l 
K+N 

E a k a^p k 
fe=i 



(6) 
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The strategy now is to consider a derivation similar to Elfv- 
ing's theorem, from optimal experiment design 0, to obtain 
an optimization problem equivalent to Q. First notice that 



(y H R- l y)\^ 



K+N . 



mm 

Ci....,CK + N 
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1 ICfe|_ 
Pk 



S.t. 



A H c = y, (7) 



where A H := [Si ••• clk+n] and c := [c\ ■■■ ck+n] T ■ 
Here the ' symbol in the summation sign indicates that the 
values of k for which pk = should be omitted from the 
sum. The proof of |7]) is given in the appendix. 

The combination of (|6]l and |7]i gives a minimization prob- 
lem in {pk} and {ck}, i.e., 



Pi, 



ci, 



mm 

,PK+N > 0, 
■ • , C-K+N 

S.t. 



K+N 

£' 
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Igfcl 
Pk 



(8) 



K+N 

E Pk = 1 

fc=l 

A ff c = y, 

where the order of the minimizing variables can be exchanged. 
Now, when the c^'s are kept fixed, the minimization of the cost 
in <|8j with respect to {pk} can be done explicitly. To see this, 
notice that by the Cauchy-Schwarz inequality we have 




where the lower bound is attained if and only if there is an 
aeC such that 

12 

= ap k , k = l,...,K + N, 



Cfc 



or 



Pk 



Pk 



I Cfc I 
■v/a' 



k = 1. 



,K + N. 



The proportionality constant a can be determined from the 

,fc=i 'Pfc = 1, giving 



condition ET + 



Pk 



E 



Cfc I 
K+N 



k = 1, 



,K + N. 



(9) 
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Putting this expression in ([8) gives the reduced problem 



(K+N 

miri \ c k 

c u ...,c K+N \ k=1 

A H c = y. 



or, equivalently, 



s.t. 



K+N 

min J2 \°k\ 

c-l,...,ck+n k=l 

s.t. A H c = y. 



(10) 



This is a complex- valued l\ -optimization problem, hence it 
can be expected to give a sparse solution in {c k }. This, in 
turn, gives a sparse solution in {pk} through |9]), and thus in 



Pk = 



Pk_ 

w k 



\c k \\\y\\2 



i\2^ K + N 



k=l,...,K + N. 



To explore the behavior of SPICE in more detail, we can 
notice, by denoting first K components of the fc-th row of A H 
as tp k , i.e., tp k :— [(ai)fe ■ • • (ax)fe]> an d observing that the 
constraints in ( fT0) i read c k+ j — Hj — <ffc for j — 1,...,N, 
that ( fT0| ) is equivalent to 



iV 



K 



min \Vk - <Pk c| + E 



Ck 



k=l 



k=l 



where c :— [ci ■ ■ ■ ck] t , or more compactly 



\y - *c||i 



(ii) 



where <P H := [tpi ■ ■ ■ <Pn], i-e-, # corresponds to the first 
K columns of A H . Equation ( fTT) is essentially a simpli- 
fied (complex-valued) version of the LAD-Lasso [7| or the 
RLAD ifTUll . where c takes the role of a parameter vector, and 
the regressors have been scaled by w k = \\y\\2/\\a,k\\2, so 
that their Euclidean norms are equal to || y \\ 2 . The fact that the 
cost function in ( fTT| considers the £1 norm of the residuals 
(y— #c) instead of their ^2 norm suggests that SPICE might be 
a robust estimator against outliers or errors with heavy-tailed 
distributions (since, heuristically speaking, it does not penalize 
large deviations of the residuals from zero, due mainly to 
outliers, as much as the £2 norm); in fact, this is the reason 
why some authors have proposed the use of the LAD-Lasso 
instead of the standard Lasso in the presence of outliers |7|. 
We can summarize these results in the following theorem: 
Theorem 1: The limit value of the SPICE iterations (allow- 
ing for different <7fc), which corresponds to the minimizer of 
is also given by the minimizer of ( fTTj ), by performing the 
following change of variables: 



Pk 



y 



UN 



\ak 



K 

i=l 



l^k=\ 



yk-v"c\) 

k = l,...,K + N. 



B. Equal variances 

Now we will analyze the variant of SPICE where the 
variances are constrained to be equal. The development in 
this case is exactly as in Section |III-A| until equation (|8). At 
this point, the constraint a\ = ■ ■ ■ = (Tjv = : o implies that 
Pk+i = ■ ■ ■ = pk+n, which allows us to simplify ([8]) as 



Pi, 



Cl, 



mm 

• • 7 C K+N 
S.t. 



K 

E 

k=l 



Pk 



N 



Pk+i k=K+1 



K+N 

^ 'c fc | 2 



K+l 

Ep'k 

k=l 

A H c = y, 
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where p' k — p k for k = l,...,K, p' K+1 = Npx+i, and 
c := [ci • • ■ ck + n] t . Now, the Cauchy-Schwarz argument 
used in Section UlI-AI reveals that 



Pk 



,K, 



N 
a 



K+N 

E 

k=K+\ 



and from the condition Y^=i^ Pk 



k = K+l, 

1 we obtain 

2 




The constants c k , on the other hand, must be the solution of 



K 

min \£k\ + 

C1,...,Ck + jv k — 1 



\ 



K+N 
k=K+\ 



(12) 



s.t. A H c = y. 
Just as in Section |III-A| ( [L?) can be rewritten as 



\ 



N 



V H k~c\ 2 



c 1, 



k=l 



where c := [c\ 



c K ] T , or 



N\\y - *c 



c 1. 



(13) 



Equation ( [T3] l is essentially a simplified (complex-valued) 
version of the standard Lasso [6 |, where c takes the role of a 
parameter vector, and the Euclidean norms of the regressors 
have been equalized. We summarize these results as a theorem: 
Theorem 2: The limit value of the SPICE iterations (impos- 
ing the constraint that o~\ = ■ ■ ■ = ojv), which corresponds to 
the minimizer of Q, is also given by the minimizer of ( |T3| , 
by performing the following change of variables: 



Pk 



PK+l 



\y\\l\c k \ 



|a*ll5(l|c||i+VW||»-#c||a) 
|S||l/-#c|| 2 



K 



N\\yf\ 



iV(||c||i+VJV||l/-*c|| 2 ) 
The following remarks are appropriate: 
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Remark 3: The results stated in Theorems Q] and [2] are 
quite surprising, because they reveal that different assumptions 
on the noise variance produce versions of SPICE which are 
equivalent to two quite different but standard sparse estimators, 
namely the LAD-Lasso and the Lasso. 

Remark 4: Even though the equivalent Lasso formulations 
are not given in the same variables as the SPICE method, 
the required variables transformations (between the c^'s and 
the Pfe's) are simple scalings. This means that the sparsity 
properties of SPICE are essentially the same as the ones for 
the equivalent Lasso estimators. 

Remark 5: The relations between the c^'s and the p^'s 
given by Theorems [T] and [2] have a nontrivial structure, which 
comes from the fact that SPICE considers the (unknown) noise 
variances as parameters to be estimated, and puts them in the 
same footing as the amplitudes of the spectral lines. 

Remark 6: The cost function g(R) minimized by SPICE 
in |4) can be interpreted as follows: The first term of g{R), 
y H R T 1 y, is a model fit measure, while the second term, 
Hyll^tr R, can be interpreted as a trace heuristic or nuclear 
norm regularization (since R = R H > 0, so the trace and 
nuclear norm coincide) 1111 . This regularization term is known 
to encourage low rank matrices R, which, due to its structure, 
R = A H PA, enforces the vector [pi, . . . ,pk+n] T to be 
sparse. This interpretation thus provides an alternative heuristic 
justification for the sparsity-inducing behavior of SPICE. 

Remark 7: Theorems [T] and [2] have been presented for the 
complex-valued versions of SPICE. However, the derivations 
in this section apply almost unaltered to real valued problems. 
This means that Theorems [T] and [2] establish Lasso-type 
equivalences for the real-valued versions of SPICE as well. 
Notice, however, that the complex Lasso versions can be seen 
as real-valued Group Lasso estimators, as explained next. 

Remark 8: The complex-valued nature of SPICE is inher- 
ited by its Lasso equivalents. Thus, for example problem ( fT3) 
does not behave as the standard (real-valued) Lasso, but as the 
(real -valued) Group Lasso lfT2l . To see this, let us define 



Vr 



cr :- 



Key 
liny 

Re # — Im # 

Im ^ Re ^ 

Based on this notation, ([13) can be written as 

K 

^NWVR-^RCRh+Y^ 



Rec 
Imc 



mm 



k=l 



(Cii)fe 
{cR.)k+K 



.(14) 



The second term in ([14) is a sum of Euclidean norms, which 
promotes group sparsity, i.e., it tries to enforce that both the 
real and imaginary parts of individual entries of c become 
zero simultaneously. Similarly, ( fTT) corresponds to a grouped 
version of the LAD-Lasso. 

Remark 9: Recently, a re-weighted version of SPICE, 
called LIKES, has been proposed in ff3l . We will not address 
here the relation between LIKES and standard sparse estima- 
tors (such as Sparse Bayesian Learning (SBL) and Automatic 
Relevance Determination (ARD) |14|), because this has partly 
been discussed in [13], and the equivalence to Lasso-type 
estimators can be formally studied along the lines of |14|. 



100 

90 
80 
70 
60 
50 
40 
30 
20 
10 




o True spectrum 

SPICE 

- - LAD-Lasso 



0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 
Frequency [Hz] 



Fig. 1. Spectrum obtained by SPICE and LAD-Lasso. 



IV. Simulation Example 

In this section, a numerical example, based on [ 1 Section 
IV], is used to illustrate the equivalence between SPICE and 
the LAD-Lasso, formally established in Theorem [T] 

Let yk — y{tk), k = 1, ...,N, be the fc-th sample, 
where the ifc's are irregular time samples, drawn independently 
from a uniform distribution on [0,200]. The basis functions 
considered here are of the form 



where oj k := 27rfc/1000. Following Q], we take N = 100, 
and y to be given by ([T) with K = 3, S145 = 3e^ 1 , 
s 310 = 10e J '* 2 and s 315 = 10e J(fe , and s k = otherwise. 
The phases <f>-y, 2 an d (f>3 are independent random variables, 
uniformly distributed in [0, 2ir]. The noise e is assumed to have 
a covariance matrix 0.25/. 

The results of applying 100 iterations of SPICE, |2), and 
its LAD-Lasso equivalent ( fTT) , solved using the CVX pack- 
age [15], are presented in Figure [Tj As the figure shows, 
both estimators practically coincide, their differences being 
mainly due to numerical implementations. Notice also that 
these estimators correctly detect the location of the peaks of 
the true spectrum, even though the estimated amplitudes do 
not approach their true values; this observation is consistent 
with theoretical results regarding the bias of the Lasso and its 
variants J9). On a PC with an 2.53 GHz Intel Core Duo CPU 
and 4 Gb RAM, 100 iterations of SPICE take 23.0 s, while the 
implementation of LAD-Lasso using CVX only takes 14.6 s. 
However, if TV is further increased to 1000, CVX is incapable 
of solving the LAD-Lasso problem, while SPICE can still 
provide a good (and numerically reliable) estimate. 

V. Conclusion 

In this manuscript, the recently proposed SPICE method for 
sparse estimation has been studied, and its relation to Lasso- 
type estimators has been established. This connection may 
enable the use of existing theoretical results for the Lasso 
to predict the behavior of SPICE in diverse problem settings, 
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and, at the same time, the application of the computationally 
efficient algorithm developed for SPICE to sparse estimation 
problems where the Lasso algorithms are currently impractical. 

As a interesting future line of research, the relation between 
SPICE and the Group Lasso suggests that the former method 
could be modified to deal with general group sparsity problems 
(instead of only groups with two real variables). In addition, 
from this relation it is easy to modify SPICE in order to 
compensate for deficiencies already detected in standard Lasso 
estimators, such as lack of consistency in sparse support 
recovery, which can be fixed by adding re-weighting steps 
(see, e.g., [16)). 

Appendix 
Proof of Equation (5) 

In this Appendix we prove Q. Without loss of generality 
we can assume that the values of k for which pt — have 
been removed from the sum. We start by rewriting |7]) as 

y H (A H PA)- X y = min c H P~ 1 c s.t. A H c = y,(\5) 

c 

where P := diag (pi, . . . ,Pk+n)- We will proceed by estab- 
lishing the minimum value of the right hand side of ( fl"5j ) and 
showing that it coincides with its left hand side. To this end, 
notice that since that optimization problem is convex, c is an 
optimal solution of the right hand side of ( fl"5] l if and only if 
there is a Lagrange multiplier A e C N such that 

-0-[c H p- 1 c + \ H (A H c-y)] = O, A H c = y. 
dc 

or, equivalently, 

2P~ 1 c + A\ = Q, A H c = y. 

From this set of equations we obtain 

A = -2(A H PA)- 1 y 
c = PA{A H PA)- 1 y, 

and the optimal cost of right hand side of ( p"5| ) gives 
c Hp-i c = y H(A H PA)- 1 A H PP- 1 PA{A H PA)- 1 y = 
y H (A H PA)~ 1 y, which corresponds to the left hand side of 
( |15) . This concludes the proof of Q. 

Remark 10: Equation (|7]i is closely related to the so-called 
Gauss-Markov theorem, which states that, in a linear regres- 
sion framework, the least squares estimator is the minimum 
variance unbiased estimator 1171 . In fact, let z = AO + e, 
where 6 e C K+N , e ~ CA/"(0, P" 1 ). Furthermore, suppose 
we are interested in estimating x = y H 6. Then, the cost 
function in the right hand side of |7]) can be interpreted as the 
variance of an estimate x = c z of x, and the corresponding 
constraint A H c = y restricts x to be unbiased, while the 
left hand side of |7]) corresponds to the minimum achievable 
variance, according to the Gauss-Markov theorem. 
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